Duration modeling and memory optimization in a Mandarin TTS system
نویسندگان
چکیده
Current speech synthesis efforts, both in research and in applications, are dominated by methods based on concatenation of spoken units. New progress in the concatenative text-to-speech (TTS) technology can be made mainly from two directions, either by reducing the memory footprint to integrate the system into embedded system, or by improving the synthesized speech quality in terms of intelligibility and naturalness. In this paper, we are focusing on the memory footprint reduction in a Mandarin TTS system. We show that significant memory reductions can be achieved through duration modeling and memory optimization of the lexicon data. The results obtained in the experiments indicate that the memory requirements of the duration data and lexicon can be significantly reduced while keeping the speech quality unaffected. For practical embedded implementations, this is a significant step towards an efficient TTS engine implementation. The applicability of the approach is verified in the speech synthesis system.
منابع مشابه
An NN-based Approach to Prosodic for Synthesizing English Words Em
In this paper, a neural network-based approach to generating proper prosodic information for spelling/reading English words embedded in background Chinese texts is discussed. It expands an existing RNN-based prosodic information generator for Mandarin TTS to an RNN-MLP scheme for Mandarin-English mixed-lingual TTS. It first treats each English word as a Chinese word and uses the RNN, trained fo...
متن کاملDuration refinement by jointly optimizing state and longer unit likelihood
We refine the duration model in HMM-based TTS by extending the work of Wu [1]. The model is refined by jointly maximizing the duration likelihoods of state, phone and syllable units. Both Gaussian and gamma distributions are employed. In synthesis, the state durations are generated by the same joint optimization procedure. By considering the duration of state and longer units jointly, the accum...
متن کاملTotally data-driven duration modeling based on generalized linear model for Mandarin TTS
This paper proposes a totally data-driven duration modeling method for Mandarin TTS, which uses Generalized Linear Models (GLM) to model duration and stepwise regression to automatically select the attribute set with statistical measurements. This method can get a good tradeoff between model complexity and goodness of fit. Besides, speaking rate is introduced as a new modeling attribute, which ...
متن کاملSyllable HMM based Mandarin TTS and comparison with concatenative TTS
This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results s...
متن کاملHigh-Quality Prosody Generation in Mandarin Text-to-Speech System
A text-to-speech (TTS) synthesizer is a computer-based system that can automatically read text aloud. Fujitsu is developing a Mandarin TTS system using state-of-the-art technologies. The prosodic structure of synthesized text provides important information for making synthetic speech produced by a TTS system more natural and understandable. This paper describes a global probability estimation m...
متن کامل